Skip to content

Conversation

@jukkar
Copy link
Member

@jukkar jukkar commented Mar 6, 2025

This is a backport of #86553 to v4.0. It does not have commit d5335d3 as the feature it is fixing was not present in v4.0

In order to avoid any mutex deadlocks between iface->lock and TX lock, release the interface lock before calling a function that will acquire TX lock. See previous commit for similar issue in RS timer handling. So here we create a separate list of multicast addresses that are to be rejoined when network interface comes up and then rejoin the groups without iface->lock held.

Fixes #86499

@jukkar jukkar added area: Networking Backport Backport PR and backport failure issues labels Mar 6, 2025
@jukkar jukkar added this to the v4.0.1 milestone Mar 6, 2025
@jukkar jukkar added this to Backports Mar 6, 2025
@github-project-automation github-project-automation bot moved this to To do in Backports Mar 6, 2025
@jukkar jukkar requested review from pdgendt and rlubos March 6, 2025 17:03
@dkalowsk
Copy link
Contributor

@jukkar somehow missed this. Looking to merge in for the impending release but not passing CI. Can you take a look?

@jukkar
Copy link
Member Author

jukkar commented Apr 30, 2025

Some Python PIP issue which looks unrelated issue to this PR.
I don't think there is anything I can do here

....
        File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/sdist.py", line 386, in prune_file_list
          base_dir = self.distribution.get_fullname()
        File "/usr/local/lib/python3.10/dist-packages/setuptools/_core_metadata.py", line 272, in get_fullname
          return _distribution_fullname(self.get_name(), self.get_version())
        File "/usr/local/lib/python3.10/dist-packages/setuptools/_core_metadata.py", line 290, in _distribution_fullname
          canonicalize_version(version, strip_trailing_zero=False),
      TypeError: canonicalize_version() got an unexpected keyword argument 'strip_trailing_zero'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

jukkar added 5 commits April 30, 2025 13:30
Add documentation for DAD start time variable as it was missing.

Signed-off-by: Jukka Rissanen <[email protected]>
Add documentation for ACD timeout variable as it was missing.

Signed-off-by: Jukka Rissanen <[email protected]>
The net_if.c:rs_timeout() is sending a new IPv6 router solicitation
message to network by calling net_if_start_rs(). That function will
then acquire iface->lock and call net_ipv6_start_rs() which will try
to send the RS message and acquire TX send lock.
During this RS send, we might receive TCP data that could try to
send an ack to peer. This will then in turn cause also TX lock
to be acquired. Depending on timing, the lock ordering between
rx thread and system workq might mix which could lead to deadlock.
Fix this issue by releasing the iface->lock before starting the
RS sending process. The net_if_start_rs() does not really need to
keep the interface lock for a long time as it is the only one sending
the RS message.

Fixes zephyrproject-rtos#86499

Signed-off-by: Jukka Rissanen <[email protected]>
In order to avoid any mutex deadlocks between iface->lock and
TX lock, release the interface lock before calling a function
that will acquire TX lock. See previous commit for similar issue
in RS timer handling.

Signed-off-by: Jukka Rissanen <[email protected]>
In order to avoid any mutex deadlocks between iface->lock and
TX lock, release the interface lock before calling a function
that will acquire TX lock. See previous commit for similar issue
in RS timer handling. So here we create a separate list of ACD
addresses that are to be started when network interface comes up
without iface->lock held.

Signed-off-by: Jukka Rissanen <[email protected]>
@jukkar jukkar force-pushed the backport-86553-to-v4.0-branch branch from 3e86312 to 0cbb687 Compare April 30, 2025 10:31
@jukkar
Copy link
Member Author

jukkar commented Apr 30, 2025

I rebased on top of latest v4.0-branch and resubmitted, lets see what happens with CI now.

@jukkar
Copy link
Member Author

jukkar commented Apr 30, 2025

@dkalowsk My rebase did not help, dunno what is going on here ¯\(ツ)/¯

@github-project-automation github-project-automation bot moved this from To do to Done in Backports May 9, 2025
@fabiobaltieri fabiobaltieri reopened this May 9, 2025
@github-project-automation github-project-automation bot moved this from Done to Needs more info in Backports May 9, 2025
@zephyrbot zephyrbot requested review from ssharks and tbursztyka May 9, 2025 10:36
@sonarqubecloud
Copy link

sonarqubecloud bot commented May 9, 2025

@dkalowsk dkalowsk merged commit 215737c into zephyrproject-rtos:v4.0-branch May 9, 2025
48 of 53 checks passed
@github-project-automation github-project-automation bot moved this from Needs more info to Done in Backports May 9, 2025
@jukkar jukkar deleted the backport-86553-to-v4.0-branch branch May 15, 2025 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: Networking Backport Backport PR and backport failure issues

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants